P a g e 3 | 16
There are many challenges in capturing space of shape variations. First, individual shape can have
different representations as image, surface meshes, or point clouds; second, one needs the unified setting
for representing both continuous deformations as well as structural changes; third, shape edits are not
directly expressed but are only implicitly contained in shape collections; and finally, learning the space
of structural variations that is applicable to more than the single shape amounts to learning mappings
between different shape edit distributions, since different shapes have various types and numbers of parts
(like tables with or without leg bars).
In much of the existing literature on 3D machine learning(ML), 3D shapes are mapped to points in the
representation space whose coordinates encode latent features of each shape. In such representation,
shape edits are encoded as vectors in that same space – in other words, as differences between points
representing shapes. Equivalently, we can think of forms as “anchored” vectors rooted at origin, while
shape differences are “floating” vectors that can be transported around in shape space. This type of vector
space arithmetic is commonly
used [wu2016learning, achlioptas2017learning, wang2018global, gao2018automatic, xia2015realtime,
Villegas_2018_CVPR], for example, in performing analogies, where the vector that is the difference of
possible point A from point B is added to point C to produce an analogous point D. The challenge with
this view in our setting is that while Euclidean spaces are perfectly homogeneous and vectors can be
comfortably transported and added to points anywhere, shape spaces are far or less so. While for
continuous variations, a vector space model has some plausibility, this is not so for structural variations:
the “add arms” vector does not make sense for the point representing a chair that already has arms. We
take the different approach. We consider embedding shapes differences or deltas directly in their own
latent space, separate from general shape embedding space. Encoding and decoding such shape
differences is always done through a VAE( variational autoencoder), in the context of the given source
shape, itself encoded through the part hierarchy. This has the number of key advantages: (i) allows
compact encodings of shape deltas, since in general, we aim to describe local variation; (ii) encourages
network to abstract commonalities in shape variations across shape space; and (iii) adapts the edit to the
provided source shape, suppressing the mode that are semantically implausible.
We have extensively evaluated the StructEdit on publicly available shape data sets. We introduce the
new synthetic dataset with ground truth shape edits to quantitatively evaluate our method and compare
it against baseline alternative. We then provide evaluation results on PartNet
dataset [mo2019partnet] and provide ablation studies. Finally, we demonstrates that extension of our
method allows the handling of both images and point cloud as shape sources, can predict plausible edit
modes from single shape examples, and can also transfer example shape edit on one shape to other shapes
in the collection.